On this page, we explore the National Walkability Index at the county-level across the United States. We also take a closer look at the walkability of New York City and conduct exploratory spatial data analysis.


Interactive Map with National Walkability Index

Map Overview

For the National Walkability Index, we wanted to go beyond the EPA’s current Walkability Map and created the following interactive spatial visualization that demonstrates county-level walkability scores and prevalence of health outcomes. We utilized many geographic information systems (GIS) concepts and synthesized skills learned from both the Data Science I (BIST P8105) and Public Health GIS (EHSC P8371) courses that we’re taking in the creation of this spatial visualization.

Our health outcomes of interest are:

  • High Cholesterol: High cholesterol among adults aged >=18 years who have been screened in the past 5 years
  • Physical Inactivity: No leisure-time physical activity among adults aged >=18 years
  • Mental Health: Mental health not good for >=14 days among adults aged >=18 years
  • Physical Health: Physical health not good for >=14 days among adults aged >=18 years
  • High Blood Pressure: High blood pressure among adults aged >=18 years

According to the EPA Methodology and User Guide, National Walkability Index scores on a scale of 1 to 20 are categorized as follows:

National Walkability Index Level
1 - 5.75 Least walkable
5.76 - 10.5 Below average walkable
10.51 - 15.25 Above average walkable
15.26 - 20 Most walkable
library(rjson)
library(leaflet)
library(tidyverse)
library(plotly)

# Read in cleaned datafile
walkability = read_csv("./data/merge_final.csv")

# Get the county-level .json file from the web
url = 'https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json'

counties = rjson::fromJSON(file = url)

# Set up the geographic projection for the map 
g = list(scope = 'usa',
         projection = list(type = 'albers usa'))

# Get the df for mean of walkability index by county
index_by_county = walkability %>% 
  mutate(fips_county = str_c(statefp, countyfp)) %>% 
  group_by(fips_county) %>% 
  summarise(ind_mean = round(mean(nat_walk_ind), 1))

# Get the df for mean of health outcome prevalence by county
hover_test_df = walkability %>%
  mutate(fips_county = str_c(statefp, countyfp)) %>% 
  group_by(fips_county, measure_id) %>%
  summarise(county_hlth = round(mean(data_value, na.rm = TRUE), 1)) %>%
  pivot_wider(names_from = "measure_id", values_from = "county_hlth") %>% 
  janitor::clean_names()

# Merge the two df by county-level FIPS
walkability_map = full_join(x = index_by_county, y = hover_test_df, by = "fips_county")

# Adding variable for hovertext
walkability_map$hover <- with(walkability_map, 
                            paste('<br>',
                                  "National Walkability Index (by County):", ind_mean, '<br>', 
                                  "% Health Outcomes (by County)", '<br>',
                                  "High Cholesterol:", highchol, '<br>',
                                  "Physical Inactivity:", lpa, '<br>',
                                  "Mental Health:", mhlth, '<br>',
                                  "Physical Health:", phlth, '<br>',
                                  "High Blood Pressure:", bphigh))

# Choropleth with hovertext of health outcomes by county by plotly
choropleth_map = plot_ly()

choropleth_map = choropleth_map %>% 
  add_trace(type = "choropleth",
            geojson = counties,
            locations = walkability_map$fips_county,
            z = walkability_map$ind_mean,
            text = walkability_map$hover,
            colorscale = "Viridis",
            reversescale = T, #to get reverse color scheme
            zmin = 1,
            zmax = 16,
            marker = list(line = list(width = 0)))

choropleth_map = choropleth_map %>%
  layout(title = "National Walkability Index in the US")

choropleth_map = choropleth_map %>% 
  layout(geo = g)

choropleth_map
Key upon hover
  • National Walkability Index (by county) [1,20)

  • National Walkability Index (by county) [1,20)
  • % of the county population reporting high cholesterol
  • % of the county population reporting physical inactivity
  • % of the county population reporting poor mental health
  • % of the county population reporting poor physical health
  • % of the county population reporting high blood pressure
  • Note: the county FIPS code is displayed on the right



Map Highlights

In our interactive map, the walkability index ranges from 1 to 16, with darker colors representing a higher walkability index and lighter colors representing a lower walkability index. Visually, most of the cities in the US have a relatively lower walkability index while most major metropolitan areas have a higher walkability index.

Compared to the EPA’s National Walkability Map, our interactive map provides additional information about the percentage of selective health outcomes. This interactive display of both walkability scores and various health outcomes across the US can be helpful for various audiences who might want to look at the spatial distribution of both simultaneously.

Exploratory Spatial Data Analysis for NYC

To further understand the National Walkability Index in smaller spatial units, we narrowed the exploration to New York City, where us five walkaholics reside.

The first step is to get a general understanding of the walkability index in the City by creating a descriptive choropleth (discrete density) map on the census tract level. We focused on the following counties:

  • Bronx County
  • Kings County
  • New York County
  • Queens County
  • Richmond County



Descriptive Choropleth Map for Walkability

library(sf)
library(tmap)
library(tidyverse)
library(leaflet)
library(shinyjs)

# Read-in shapefile
choropleth_NYC <- st_read("./data/cluster_analysis/map_with_everything.shp", quiet = TRUE)

# Clean up and filter
choropleth_NYC = choropleth_NYC %>% 
  janitor::clean_names() %>% 
  mutate(nat_walk_i = as.numeric(nat_walk_i),
        nat_walk_i = round(nat_walk_i, digits = 1)) %>% 
  filter(stusps == "NY",
         namelsadco %in% c("Bronx County", "Kings County", "New York County", "Queens County", "Richmond County")) %>% 
  rename(median_income = s1903_c03) %>% 
  mutate(median_income = as.numeric(median_income))

choropleth_NYC = choropleth_NYC[!is.na(choropleth_NYC$nat_walk_i),]

# Time to map
tmap_mode("view")

tm_basemap("CartoDB.Positron") +
  tm_shape(choropleth_NYC) +
  tm_polygons(col = "nat_walk_i", 
              style = "fisher",
              n = 5,
              title = "National Walkability Index",
              palette = ("Greens"))

From this choropleth map, we can see that New York City is walkable overall. However, we want to further identify census tracts that are statistically significant hot spots, cold spots, and spatial outliers using clustering analysis.

Clustering Analysis for Walkability

Understanding the terms

As mentioned, we want to find NYC census tracts that are statistically significant hot spots, cold spots, and spatial outliers. To start, it is crucial to understand the terms.

  • Hot spots/high positive clusters or Cold spots/low positive clusters
    • High-High: census tracts with high walkability that are surrounded by census tracts with a high walkability index
    • Low-Low: census tracts with low walkability that are surrounded by census tracts with a low walkability index
  • Spatial outliers:
    • High-Low: census tracts with high walkability that are surrounded by census tracts with a low walkability index
    • Low-High: census tracts with low walkability are surrounded by census tracts with high walkability index

Local Moran’s I

To find these clusters, we used local Moran’s I. The local spatial statistic Moran’s I is calculated for each zone based on the spatial weights object used. The values returned include a Z-value, and may be used as a diagnostic tool. The statistic is:


Essentially, the local Moran’s I tells us the local indicator of spatial association or LISA statistic, which is used as a cluster indicator to map different walkability clusters in the NYC.

Defining Neighborhood for Spatial Weights

For local Moran’s I, we also need to specify the spatial weights by defining neighborhood connectivity (what is considered a neighborhood). The goal of this step is to identify nearest census tract around a specific census tract. We used the Queen’s first order contiguity to establish neighborhood connectivity in the following analysis.

If a census tract is within the threshold distance of the selected census tract, a 1 is given, otherwise a 0 is given. Neighborhoods are created based on which observations are judged to be contiguous.

Queen’s first order contiguity suggests that, if a census tract share a vertex or a line segment with the selected census tract, they are given a weight of 1 (adjacent and are treated as a neighbor), else they are given a value 0 (nonadjacent and are not treated as a neighbor).

Visually, it looks like this:


Clearly defining contiguity is needed to conduct cluster analysis since we want to know what is considered as a neighboring census tract.

Clustering Map

By having a basic understanding, we can plot the clustering map of walkability in NYC using Queen’s first order contiguity and local Moran’s I.

library(rgeoda)
library(sf)
library(tmap)
library(tidyverse)
library(leaflet)

# Read-in files
cluster_map <- st_read("./data/cluster_analysis/map_with_everything.shp", quiet = TRUE)

# Clean up and filter
cluster_map_clean = cluster_map %>% 
  janitor::clean_names() %>% 
  mutate(nat_walk_i = as.numeric(nat_walk_i),
         nat_walk_i = round(nat_walk_i, digits = 1)) %>% 
  filter(stusps == "NY",
         namelsadco %in% c("Bronx County", "Kings County", "New York County", "Queens County", "Richmond County")) %>% 
    rename(sex_male = dp05_0002p,
           sex_female = dp05_0003p,
           age_18plus = dp05_0021p,
           race_white = dp05_0037p,
           race_black = dp05_0038p,
           race_aian = dp05_0039p,
           race_asian = dp05_0044p,
           race_2_plus = dp05_0058p,
           race_hispanic = dp05_0071p,
           median_income = s1903_c03) %>% 
  mutate(median_income = as.numeric(median_income))

cluster_map_walkability = cluster_map_clean[!is.na(cluster_map_clean$nat_walk_i),]
  
# Queen 1st order weights matrix 
queen_w <- queen_weights(cluster_map_walkability, order = 1)

# Local Moran's I
lisa <- local_moran(queen_w, cluster_map_walkability["nat_walk_i"],
                    permutations = 9999)

# Get cluster column and join it to shp
cluster_map_walkability$cluster <- as.factor(lisa$GetClusterIndicators())
str(cluster_map_walkability)
# Add labels to clusters
levels(cluster_map_walkability$cluster) <- lisa$GetLabels()

# We want the GeoDa colors
lisa_colors <- lisa_colors(lisa)

# Recode clusters using tidyverse (dyplr)
cluster_map_walkability %>% 
  mutate(cluster = recode_factor(.x = cluster,
                                  `Undefined` = "Not significant",
                                  `Isolated` = "Not significant")) -> cluster_map_walkability

# Map the clusters with a basemap
tmap_mode("view")

tm_basemap("CartoDB.Positron") +
  tm_shape(cluster_map_walkability) +
  tm_polygons(col = "cluster", 
              palette =  c("#eeeeee", 
                      "#FF0000",
                      "#0000FF",
                      "#a7adf9",
                      "#f4ada8"),
              alpha = .8,
              title = "Clusters",
              border.col = "black",
              border.alpha = .9) 

Highlights & Importance

Looking at Manhattan, we can see that there are several High-High clusterings (i.e., census tracts that with walkability that are surrounded by census tracts with a high walkability index) in the Midtown and SOHO area. Some parts of Queens, such as Flushing and Forest Hills, also indicates a High-High clustering.

However, parts of the Bronx shows the Low-Low clustering, where census tracts that are low in walkability are surrounded by census tracts with low walkability index. The Low-Low clustering in Brooklyn, on the other hand, might be due to the location of the JFK airport.

You might be wondering, what is the importance of finding these statistically significant hot spots and cold spots of walkability in the census tract level?

Reflecting back to the primary goal of the project, which is to highlight the connections between walkability, the built environment, and community-level health, identifying these clusters can help policy makers to put there attentions on those Low-Low areas. The High-High neighborhoods are doing a good job in building a walkable environment and the Low-Low areas will benefit a lot by improving built environment infrastructure to make the community more walkable for all.

Extra Resources

You can get more information about local Moran’s I here, and read about R package for analyzing spatial data.

We referenced this page when defining Queen’s first order.